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The conventional definition of a depth function is vector-based. In this paper, a novel projection 
depth (PD) technique directly based on tensors, such as matrices, is instead proposed. Tensor 
projection depth (TPD) is still an ideal depth function and its computation can be achieved 
through the iteration of PD. Furthermore, we also discuss the cases for sparse samples and 
higher order tensors. Experimental results in data classification with the two projection depths 
show that TPD performs much better than PD for data with a natural tensor form, and even 
when the data have a natural vector form, TPD appears to perform no worse than PD. 

Keywords: data depth; Rayleigh projection depth; statistical depth; tensor-based projection 
depth 

1. Introduction 

In the last ten years, statistical depth functions have increasingly served as a useful 
tool in multidimensional exploratory data analysis and inference. The depth of a point 
in the multidimensional space measures the centrality of that point with respect to a 
multivariate distribution or a given multivariate data cloud. Depth functions have been 
successfully used in many fields, such as quality indices [17, 20], multivariable regression 
[24], limiting p values [18], robust estimation [3], nonparametric tests [4] and discrimi- 
nant analysis [6, 11, 12, 14]. Some common statistical depths which have been defined 
include half-space depth [25], simplicial depth [19], projection depth [7, 8, 23, 29], spatial 
depth [26], spatial rank depth [10] and integrated dual depth [5]. Compared to the others, 
projection depth (PD) is preferable because of its good properties such as robustness, 
afnne invariance, maximality at center, monotonicity relative to deepest point, vanishing 
at infinity and so on. 

However, almost all the depths proposed in the literature are defined over the vector 
space by now, and the fact is that not all of the observations are naturally in vector 
form. In the real world, the extracted feature of an object often has some specialized 
structures, and such structures are in the form of a second, or even higher order tensor. 
For example, this is the case when a captured image is a second-order tensor, that is, a 
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matrix, and when the sequential data, such as a video sequence for event analysis, is in 
the form of a third-order tensor. It would be desirable to keep the underlying structures 
of the data unchanged during the data analysis. 

Most of the previous work on depth has first transformed the input tensor data into 
vectors, which in fact changes the underlying structure of the data sets. At the same time, 
such a transformation often leads to the curse of dimensionality problem and the small 
sample size problem since most depth functions (such as Mahalanobis depth) require the 
covariance matrix to be positive definite. 

Therefore, it is necessary to extend the definition of depth to tensor spaces in order to 
process the data sets directly with tensors without modifying the structures of them. In 
fact, many tensor-based methods in discriminant analysis have been proposed and have 
led to many nice results [2, 27, 28]. In this paper, informed by the aforementioned works, 
we propose a tensor-based projection depth (TPD) in order to extend the definition of 
projection depth to tensor spaces. We will prove that TPD is still an ideal depth according 
to the criteria [30]. Also, we will explore the characteristics of high order tensor projection 
depth in theory. We will demonstrate that TPD allows us to avoid the above two problems 
when using vector representation. 

The paper is organized as follows. Section 2 briefly introduces tensor algebra. Section 3 
introduces the projection depth and gives the solution to the Rayleigh projection depth. 
Section 4 gives the definition of tensor projection depth and discusses its properties. 
Section 5 supplies the algorithm for TPD and analyzes its convergence. Section 6 analyzes 
the special case of sparse samples. Section 7 discusses the TPD for higher order tensors. 
Section 8 gives numerical results for TPD. Section 9 concludes the paper, and proofs of 
selected theorems and propositions are given in the Appendix. 

2. Tensor algebra 

A tensor T of order k is a real- valued multilinear function on k vector spaces [13]: 

T:W ni x ••■ x M. nk ->R. 

A multilinear function is linear as a function of each variable considered separately. The 
set of all fcth-order tensors on R ni , i=l,...,k, denoted by T fc , is a vector space under 
the usual operations of pointwise addition and scalar multiplication: 

(aT)(a 1 ,...,a fc ) = o(T(ai,...,a fc )), 
(T + r')(ai, . . . , a*) = T(ai, . . . , a*) + T'(a u . . . , a*), 

where a 4 e W H . 

Given two tensors, S E T k and TeT ! , their product, 



S®T:R ni x ••• x R n "+ l -> M, 
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is defined as 

S ® T(ai, . . . ,a k+ i) =S(ai, . . . ,a k )T(a k+ i, . . .,a k+ i). 

It is immediate from the multilinearity of S and T that S ®T depends linearly on each 
argument separately, so it is a (k + Z)th-order tensor. 

First-order tensors are simply vectors on M" 1 . That is, 7i = 1Z" 1 , where 1Z™ 1 is the dual 
space of R" 1 . A second-order tensor space is a product of two first-order tensor spaces, 
that is, T 2 = lZ ni Cg) 1Z™ 2 . Let ei, . . . , e ni be the standard basis of R™ 1 and E\, . . . , e ni be 
the dual basis [21] of lZ ni which is formed from coordinate functions with respect to the 
basis of lZ ni . Likewise, let §1, . . . , e ni be a basis of l™ 2 and E\, . . . , e ni be the dual basis 
of K ri2 . We have 

ei(ej) = Sij and ei(fij) = 5ij, 

where Sij is the Kronecker delta function. Thus, {ei <E> ij} (1 < i < ni, 1 < j < 712) forms 
a basis for lZ ni ® 7?" 2 . For any second-order tensor T, we can write 

Given two vectors a = X^feli a fe e fe G K" 1 and b = h&l j we have 

Cni ri2 \ 

E a fe e fc>E^ J 
fe=i (=1 / 

= E T ^ ( 5>* e *U- ( E 6 < g < ) ( 2J ) 

ij \A;=1 / \l=l / 

ij 

This shows that every second-order tensor in lZ ni ® uniquely corresponds to an 
ni x U2 matrix. 

Note that in this paper, our primary interest is focused on second-order tensors. How- 
ever, most of our conclusions for second-order TPD can be naturally extended to higher 
orders. We will discuss this question in Section 7. 

3. Projection depth 

According to [29], the definition of projection depth can be expressed as follows. 

Definition 3.1. Let /1 and a be univariate location and scale measures, respectively. 
Define the outlyingness of a point x £ W with respect to a given function F of X in W, 
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p > 1 , as 



0(x,F)= sup (3.1) 

||u|| = l cr(^u) 



where F u is the distribution of u T X . Then, 0(x,F) is defined to be if u T x — /J.(F U ) = 
c(-Pu) = 0. The projection depth (PD) of a point x£l f with respect to the given 
F, PD(x, F) , is then defined as 

PD ^ F) = TTW^y (3 ' 2) 

Remark 3.1. Here, we also assume that p, and a exist uniquely, /i is translation 
and scale equivariant, and a is scale equivariant and translation invariant, that is, 
IJ,(F s y+c) = sfj,(Fy) + c and a(F S Y+ c ) = \ s \o~(Fy), respectively, for any scalars s, c and 
random variable Y G R 1 . 

The most popular outlying function is defined as 

nt w\ |u T x-Med(F u )| 

Q( ^ = lS MAD(F U ) ' (3 ' 3) 

where F u is the distribution of u T A, Med(F u ) is the median of F u and MAD(F U ) is the 
median of the distribution of \u T X — Med(F u )\. 

Apart from the good properties of a statistical depth function, this version of PD 
is more robust compared with other depths. However, it is hard to compute for high- 
dimensional samples. 

Obviously, the variance and mean are also natural choices for a and /i, respectively. It 
is easy to prove that such a projection-based depth is also an ideal depth function. And, 
most importantly, its computation is very simple. 

Theorem 3.1 (Rayleigh projection depth). Let (p,a) = ( mean, variance,), and sup- 
pose that the second moments of X exist and that X ~ F. The solution of the outlying 
function (3.1) is then that of a Rayleigh quotient problem, 

n , . |u T x-S(u T A)| 
O r {x,F)= sup 



„|=i y/E (u T X - E(u T X)) 2 
luf Aui _ ^~ 



V u i Bu i 

where A is the matrix (x - EX){x - EX) T , B is E(X - EX)(X - EX) T , X x is the 
largest eigenvalue of the generalized eigenvalue problem 



Az = ABz, z^O, 
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and Ui is the corresponding eigenvector of Ai . 

We call this projection depth the Rayleigh projection depth. 

Remark 3.2. In this paper, for the convenience of computation, the examples in the 
experiments are all based on the Rayleigh projection depth, that is, (/i, a) = (mean, 
varance) . 

Remark 3.3. Obviously, RPD requires the covariance B to be positive. To avoid this 
situation, for the sparse samples, we simply project the samples into their nonzero sub- 
space using principal component analysis (PC A). 



4. Tensor projection depth 

Before describing tensor projection depth, we first review the terminology associated 
with tensor operations [15, 16]. The inner product of tensors A and B (with the same 
orders and dimensions) is (A,B) = ^ ■ AyBy. The norm of a tensor A is defined as its 

Frobenius norm, that is, ||A|| = */ (A, A), and the distance between two tensors A and 
B in Tl ni ®K n2 is defined as ||A-B||, where A B = (A l3 - B y )„ lX „ 2 . 

From the tensorial viewpoint, if we take X as a random variable in the first-order 
tensor space 1Z™ 1 , then the outlyingness of the projection depth in Definition 3.1 can be 
expressed as 

0(x,*)=sup ^=« . 

||u||=l CT W U )) 

Thus, if X G lZ ni <E> TZ 712 is a random variable, then, according to the formula (2.1), the 
outlying function in the tensor space 1Z ni <g) IZ 122 can be naturally defined as 

n , v v , |X(u,v)-/i(*(u,v))| |u T Xv-/i(u T *v)| fA _. 
0(K,X)= sup — .\ — = sup 1 P ^, 4.1 

l| u ||=||v||=i cr(Af(u,v)J || u || = ||v||=i a(u x Xv) 

where u e W ni and v £ W 12 . 



Definition 4-1 (Tensor projection depth). The projection depth with outlying func- 
tion given by formula (4-1) is called tensor projection depth. 

For a given univariate location (or "center") measure [i, a distribution function Fx is 
called fi- symmetric about the point 6 € TZ ni <g> T2.™ 2 if /i(u T AV) = u T 6v for any pair of 
unit vectors u £ R™ 1 , v £ M" 2 . We have the following theorem. 

Theorem 4.1. Suppose that 6 in lZ ni ®1Z n2 is the point of symmetry of a distribution 
F(X) with respect to a given notion of symmetry. The tensor projection depth function 
TPD(X,X) is: 

1 . convex; 

2. symmetric for (jl- symmetric F; 
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3. affine invariant; 

4. monotonic relative to the deepest point; 

5. vanishing at infinity, that is, TPD(X.,X) — > as ||X|| — > oo; 

6. maximized at the center of [i- symmetric F. 

Remark J^.l. Theorem 4.1 shows that TPD is still an ideal depth according to the 
criteria [30]. Furthermore, we can easily obtain many other properties of TPD beyond 
those of the PD in [29], such as the properties of its sample versions and its medians. 
However, these are not the key points of this paper and so we omit any detailed discussion 
here. 



5. Algorithm 

Suppose that the elements of S n = {Xi, . . . , X„} are generated from F (where F n is its 
empirical distribution) and that X is a fixed tensor. The TPD of X with respect to F n 
can then be computed by the following algorithm: 

1. Initialization: Let u = (1, . . . , 1) T . 

2. Computing v: Let jq = 'X.J u and F" = u T F n . Then, v can be computed by solving 
the vector-based projection depth 

Kx-MF>)I 

IMI-i 

3. Computing u: Once v is obtained, let x, = X^v and F£ = F n \. Then, u can be 
computed by solving the following optimization problem: 

|u r x- M (u r i^)| 
sup T . (5.2) 

llu||=i CT ( uT ^,T) 

4. Iteratively computing u and v: Using steps 2 and 3, we can iteratively compute u 
and v until they tend to converge. 

Remark 5.1. The optimization problems (5.1) and (5.2) are the same as (3.3) in the 
vector-based projection depth algorithm. Thus, any computational method for the pro- 
jection depth can also be used here. 

The following theorem shows that the above algorithm converges. 



Theorem 5.1. The iterative procedure to solve the optimization problems (5.1) and (5.2) 
will monotonically increase the objective function value in (4-1), hence the algorithm 
converges. 
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Remark 5.2. Furthermore, if the optimization problem (3.1) is convex, then the solution 
of (4.1) is also globally optimal. For instance, if (/x,cr) = (mean, variance) (the Rayleigh 
projection depth), then its solution is also globally optimal. 

6. Sparse samples 

As with RPD, TPD based on RPD also faces the problem of sparse samples. From 
formulas (5.1) and (5.2), we know that for any sample set S n = {Xi,...,X„} and its 
corresponding empirical distribution F n , the algorithm in the previous section requires 
the covariance matrices of F" and to be positive for any u £ M. ni and v £ M™ 2 . 
However, in practice, the tensor data usually do not satisfy such requirements. 

There are two factors that can lead to such non-positiveness. First, the sample size 
is too small, that is, the size of S n is less than n\ or ri2. Second, the data have some 
common columns or rows (e.g., the images have identical color edges or patterns). In 
the vector space, we usually use PCA to remove the redundant null space of the sam- 
ples and therefore we can use the tensor PCA proposed by Cai et al. [1] to reduce the 
dimensionality of the tensor samples. 

Suppose that M x = \ Ya=i x jj 

n 

Mv = ^((Xi - M x )(Xi - A/ X ) T ), 
i=i 

n 

M u = ~ A/ x) T (X - M X )), 

i=l 

where the columns of V are the eigenvectors of My, and U are the eigenvectors of Mjj. 
Thus, the new mappings of F n can be expressed as 

F (n,r 2 ) = {V T XlUr2 ^ . . j V^ n U r2 }, (6.1) 

where n and r-i are the mapping dimensions, and V ri and U r2 are the first n and T2 
columns of V and U, respectively. Here, we take n and T2 to be the ranks of My and 
M v . 

Theorem 6.1. For any u £ K ri , v G M r2 with ||u|| = ||v|| = 1, the covariance matrices 
of u T F„ ri ' r2 ^ and Fn 1 '^^ are always positive. 

7. Higher order tensors 

The algorithm described above takes second-order tensors (i.e., matrices) as input data. 
However, the algorithm can also be extended to higher order tensors. In this section, we 
briefly describe the TPD algorithm for higher order tensors. 
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Let S n = {Xj,i = 1, ...,n} denote the sample set and F n its empirical distribution, 
where X l e ft" 1 ® • ■ • ® ft" fc . The outlying function of TPD is then 

Q{ )= . |X( Ull ... uO-^n(U! U.))! 

|| Ul || = -=|KII=l cr(F„(ui,...,Ufc)) 

where u, e R"* . 

Before stating the algorithm, we first introduce an item of notation which we will need. 
If T e ft" 1 ® • • • ® ft" fc , then for any a ( 6 R" ! , 1 < I < k, we use T x z a; to denote a new 
tensor in ft" 1 ® • • • ® ft"'- 1 ® ft" !+1 ® • • • ® ft" fc , namely 

T Xi ai = ^2T iu ... tk _ u ... !h+u ... tik -a H . (7.2) 
<i=i 

Thus, the algorithm for higher order tensors can naturally be expressed as follows: 

1. Initialization: Let u° = (xi, . . . , x ni ) T , ajj G M, j = 1, . . . ,rii, i = 1, . . . , k — 1. 

2. Computing u°: If we let x fc = X Xi X2 u 2 x • • • X&_i u^_ 1; then u° can be com- 
puted by solving the vector-based projection depth 

\ul T x k -n(F n xiuj x 2 u° x-Xmu^I 
SU P 77f — \ ■ l'- 3 ) 

||u°||=l a \ F n X l u l X 2 U^ X ••• Xfc-lU^J 

3. Computing u^_ x : Once u° is obtained, we let x fe_1 = XXiUj X •■■ Xfc-2 u°_ 2 X& 
and u^_j can be computed by solving the optimization problem 



l*»fc-l* -KFn XiU° X ••• Xfe_ 2 ug_ 2 X fc U°)| 

sup ; n R prr . (7.4) 

=1 <t(F„ XjU X ■•■ Xfc_ 2 u£_ 2 XfcU^) 



4. Iteratively computing vn, i = l,...,k, until they tend to converge. 

Remark 7.1. It is easy to prove that TPD in a higher order tensor space still satisfies 
the above theorems and that its convergence is also guaranteed by Theorem 5.1. 



8. Experiments 

First, we use data classification to demonstrate the validity of the TPD. Consider a 
multivariate data set C that is partitioned into given classes C±, . . . ,C q . An additional 
data point x has to be assigned to one of several given classes of object. Suppose that 
there arc q classes. The most natural classifier provided by [14] is then 



classd(?t) = argmaxZ?(x|C :) ), 



(8.1) 
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Figure 1. Recognition rates of TPD and PD under different training sample sizes. 



where D(x\Cj) is the depth of the x with respect to class Cj, i = 1, . . . ,q. This assigns x 
to the class Cj in which x is deepest. 

The Columbia Object Image Library (COIL-20) [22] is a database of grayscale images of 
20 objects. The objects were placed on a motorized turntable against a black background. 
The turntable was rotated through 360 degrees to vary the object pose with respect to 
a fixed camera. Images of the objects were taken at pose intervals of 5 degrees. This 
corresponds to 72 images with the dimensions of 32 x 32 pixels per object. Here, we only 
take the first 10 objects as examples. 

In the experiments, recognition rates under different training sizes are computed by 
means of the following steps: 

1. Select the test sets: Randomly select p test sets X^ est from the object set Xj for 
each class, where j = 1, . . . , 10. 

2. for each training size nk 

for each repeating round t: 

• Randomly select the training sets: Randomly select nk training sets from 
Xj/Xj cst (the left samples of Xj) for each j, j = 1, . . . , 10. 

• Compute the recognition rate. Compute the correctly recognized number lj 
for each test set X^ GSt by using the formula (8.1) and compute the glossary 
recognition rate by r\ t = J~]j—i £j/lOp. 

3. Compute the mean and variance of rjt- 
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Table 1. The mean, deviation and variance of the recognition rates by TPD and PD with the 
COIL-20 set 



Training 



Tensor projection depth 



Projection depth 



size 


Mean 


Min. 


Max. 


Variance 


Mean 


Min. 


Max. 


Variance 


25 


0.3638 


0.2571 


0.4143 


0.0018 


0.0695 


0.0286 


0.1571 


0.0017 


30 


0.6571 


0.5286 


0.7429 


0.0028 


0.1333 


0.0571 


0.2429 


0.0028 


35 


0.7857 


0.7000 


0.8571 


0.0017 


0.1526 


0.0429 


0.2571 


0.0044 


40 


0.8390 


0.7714 


0.9000 


0.0010 


0.3010 


0.1714 


0.3714 


0.0029 


45 


0.8610 


0.7714 


0.9429 


0.0019 


0.4410 


0.3571 


0.5571 


0.0041 


50 


0.8990 


0.8429 


0.9286 


0.0008 


0.5124 


0.4429 


0.6000 


0.0017 


55 


0.9086 


0.8714 


0.9571 


0.0006 


0.5419 


0.4429 


0.6286 


0.0021 



Here, p= 7 and the training number equals 25, 30, 35, 40, 45, 50, 55, respectively. The 
results are shown in Figure 1 and Table 1. 

From Figure 1 and Table 1, we can see that for such samples with intrinsic tensor 
form, TPD performs better than PD. A question then naturally arises: If the data sets 
are naturally in vector form, how does TPD perform compared with PD? We will answer 
the question by means of the following experiment. 

We consider the famous Iris data [9], which contains measurements of four different 
features (sepal length, sepal width, petal length and petal width) for each of 150 obser- 
vations from three different types of iris plant: (1) setosa; (2) virginica; (3) versicolor. 
We randomly choose 10 observations from each class to construct the test sets and then 
randomly select 10, 15, 20, 25, 30, 35, 40 samples from the remaining observations as the 
respective training sets. For the computation of TPD, the samples are reshaped as 2 x 2. 

From Table 2 we can see that there is no apparent difference between the two results. 
Therefore, data from vector spaces can be converted into tensors and we can perform the 
depth procession with TPD. 



Table 2. The mean, deviation and variance of the recognition rates by TPD and PD with the 
Iris set 



Tensor projection depth Projection depth 



size 


Mean 


Min. 


Max. 


Variance 


Mean 


Min. 


Max. 


Variance 


10 


0.9698 


0.8571 


1.0000 


0.0016 


0.9476 


0.7619 


1.0000 


0.0041 


15 


0.9889 


0.9524 


1.0000 


0.0004 


0.9841 


0.9524 


1.0000 


0.0005 


20 


0.9952 


0.9048 


1.0000 


0.0004 


0.9921 


0.8571 


1.0000 


0.0008 


25 


0.9984 


0.9524 


1.0000 


0.0001 


0.9984 


0.9524 


1.0000 


0.0001 


30 


0.9984 


0.9524 


1.0000 


0.0001 


0.9984 


0.9524 


1.0000 


0.0001 


35 


1.0000 


1.0000 


1.0000 


0.0000 


1.0000 


1.0000 


1.0000 


0.0000 


40 


1.0000 


1.0000 


1.0000 


0.0000 


1.0000 


1.0000 


1.0000 


0.0000 
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In this paper, tensor projection depth is proposed as an extension of the definition of 
depth to tensor spaces. We show that, according to the criteria [30], TPD satisfies all 
four desirable properties. TPD has the advantages of avoiding the curse of dimensionality 
and keeping the natural structures of the data sets invariant. For sparse samples, we use 
tensor PCA to remove their null space and compute the TPD in the subspace. The 
numerical results show that TPD performs better than PD for data which are naturally 
in tensor form. 

Data sets which are naturally in vector form can also be processed using TPD, which 
converts the data into tensor form. Although such processing will actually change the 
structure of the data sets to some extent, numerical results show that there are no 
apparent differences in the outcome. For some (fi,cr), such tensor-based processing can 
effectively decrease the computational complexity of PD caused by the dimensionality. 



Appendix: Proofs 

Proof of Theorem 4.1. Convexity. We will show that the outlying function (4.1) is 
still convex. Let Xi , X2 6 lZ ni ® 7£™ 2 be two arbitrary points, < A < 1 , and for the point 
X = (1 - A)Xi + AX 2 , we have 

|u T X v - n(u T Xv)\ 

= |(1 - A)(u T X lV - n(u T Xv)) + A(u T X 2 v - n(u T Xv))\ 
< (1 - A)|u T Xiv - fi(u T Xv)\ + A|u T X 2 v - ^{u T Xw)\ 

and 

O(K ,X) = sup 

l| u |H|v||=i (t(u x X\) 

(1 - A)|u T Xiv - /i(u T A-v)| + A|u T X 2 v - n{u T Xv)\ 

|]u|] = ||v||=i o-(u Xv) 

= (l-A)0(X 1 ,Af) + AO(X 2 ,^). 

Thus, 

TPD(X Q ,X) > (l-X)TPD(Xx,X) + \TPDpL 2 ,X). 
Symmetry. This is straightforward. 

Affine invariance. Suppose that A„ lXni and B n2X „ 2 are any two non-singular matrices. 
We then have 

^ . ^ |u T AXBv-u(u T AA?Bv)| 
0(AXB,AXB)= sup 



||u|HM|=i a v u AXBv) 
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For fixed X, suppose that 



(u ,v ) = arg sup 0{X,X). 

I|u||=||v||=l 



Thus, if we fix v = Vo and let 

|u T A(Xv )- / i(u T A(A'vo))| 

ui=arg sup ta^v — v\ ' 

||u||=i o-(u T A(AV )) 

then, according to Theorem 2.1 in [29], 

lu^Xvoj-^A^Vo))! |u r (Xv )-//(u r (*v ))| 
I,™?! a(u^A(Av )) ^ aK(^v„)) 

Thus, Ui A = Auo, where Asl, and we have 

KA(Xv)- M KAQyv))| |Au^(Xv)~ M (A<(^vo))| 
11^! a(uf A(Xv)) <T(\ug(Xv)) 

K(Xv)- A1 (u2'(^vo))| 



sup 



||v||=l <T(uJ(^v)) 

Therefore, 

ufA(Xv)- M KA(A-v))| 



vi=arg sup tmv \\ =v °- 

||v||=i a(u(A(Xv)) 

Similarly, 



|uf AXv-/i(uf AAV)| _ |uf AXBv - //(uf AAfBv) 



|u T AXBv - ^(u T AA-Bv)| 
l|u||=H|=i cr(u T AA'Bv) 

_ |u^AXBv -^(u^AA'Bv )| 
t«A^Bv ) 

The result then follows. 

Monotonicity relative to deepest point. Suppose that Xi,X2,X c £ lZ ni (g>lZ n2 , X c is 
the deepest tensor and Xi = AX2 + (1 — A)X C , A € [0, 1]. Then, since 

0(Xi, X) < (1 - A)0(X 2 , Af) + AO(X C) AT), 

we have 

0(Xi, X) - AO(X C) A?) < (1 - A)0(Xi, A?) < (1 - A)0(X 2 , Af). 
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Thus, 0{X u X) < 0(X 2 ,X) and TPD(X U X) > TPD{X 2l X), that is, the tensor pro- 
jection depth decreases monotonically along any ray emanating from the deepest point. 

Maximality at center. Suppose that F is ^-symmetric about a unique point X c £ 
R" 1 x R U2 . Then, for any pair of unit vectors u, v, we have /i(u T AV) = u T X c v and the 
result follows. 

Vanishing at infinity. This is straightforward. □ 
Proof of Theorem 5.1. Define 

/(U,V)= SUp — — rr . 

|]u|] = ||v|]=l C7(F n (u,v)) 

Let Uo be the initial value. Fixing Uo, we get Vo by solving the optimizations (5.1) and 
(5.2). 

Likewise, fixing vo, we get Ui by solving the optimization problem (5.2). Thus, we 
have 

/(u ,v ) < /(ui,v ). 

Finally, we get 

/(u ,v ) < /(uj,v ) < /(ui,vi) < /(u 2 ,vi) 
Since / is bounded, it converges. □ 
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